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ABSTRACT 

In self-adapted testing, examinees are allowed to 
choose the difficulty of each item to be presented immediately before 
attempting it. Previous research has demonstrated that self-adapted 
testing leads to better performance than do fixed-order tests and is 
preferred by examinees. The present study exeunined the strategies 
that 29 college students used in selecting items during a 
self-adapted test. After completing the Test Anxiety Inventory, 
subjects took the self -adapted test. The test contained 40 items 
sorted into 8 categories of difficulty based on I^sch model 
estimates. Threv. test-taking strategies were identified. Most 
subjects adopted a flexible strategy in which they generally selected 
easier items following failure euid harder items following success. 
Some subjects adopted a "failure intolerant" strategy in which they 
generally selected easier items following failure and items of the 
same difficulty after success. Finally, some subjects adopted a 
"failure tolerant" strategy in which they chose items of the same 
difficulty level after failure, but harder items after success. The 
failure-tolerant strategy was associated with lower estimated ability 
than were the other two strategies. This finding may reflect the 
attributions examinees adopting that strategy make and the effort 
they expend following failures. The results provide general support 
for the value of continued development of self -adapted testing. 
(Author/TJH) 
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Abstract 

In self adapted testing, examinees are allowed to choose the difficulty of each 
item to be presented immediately before attempting it Previous research 
(Rocklin & aOonnell, 1987) has demonstrated that self adapted testing leads 
to better performance than fixed order tests and is preferred by examinees. 
The present study examined the strategies that subjects used in selecting items 
during a self adapted tests. Three strategies were identified. Most subjects 
adopted a flexible strategy in which they generally selected easier items 
following failure and harder items following success. Some subjects adopted 
a "fa:.ure intoleranr strategy in which they generally selected easier items 
following failure and items of the same difficulty after success. Finally, some 
subjects adopted a "failure tolerant" strategy in which they chose items of the 
same difficulty after failure, but harder items after success The failure 
tolerant strategy was associated with lower estimated ability than the other 
two strategies. This may reflect the attributions examinees adopting that 
strategy m^ ke and the effort they expend following failure. 
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Individual Differences in Hem Selection in 
Computerized Self Adapted Testing 
Thomas Rocklin 

Test constructors, including classroom instructors, often begin a test 
with a few relatively easy items with the intent of minimizing the impact of 
examinees' test anxiety on their performance. Indeed, this practice is sug- 
gested in a number of text books on measurement (e.g., Kaplan & Sacuzzo, 
1982) Specifying the difficulty of the first few items is a very simple approach 
to item sequencing! wi»hin a test. More complete specifications might in- 
clude item sequences based on monotonically increasing difficulty, monoton- 
ically decreasmg difficulty, "spiraling" difficulty, or random assignment of 
:tems to positions within the test. The effects of each of these item sequenc- 
ing specifications on examinee performance has been investigated, but with 
mixed results (Lafitte, 1984). 

Nonetheless, it seems likely that item sequencing makes a difference in 
examinee performance. When item difficulty is manipulated between tests 
instead of withm tests, it interacts with examinee test anxiety in influencing 
performance on the examination (Rocklir. & Thompson, 1985). In particular, 
the relationship between performance and test anxiety appears to be negative 
for difficult tests, but positive or curvilinear for easier tests. Given that item 
difficulty and examinee test anxiety appear to interact, the relation between 
item sequencing and performance is likely to be complex. 



«-n>fOughout ih,s paper, "acm icqucnong^ refers to sc.,uendiig in (emis of item difficulty, 
r Jtlicr ihan scqucnang based on cumculum. objectives, or oiher content factors. 
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Testing technologies differ in the extent to which the test constructor 
has control over the sequence in which items are attempted In traditional 
paper and pencil tests, the test constructor can select the order in which items 
are presented, and therefore exert modest control over the order in which 
they are attempted. Examinee's, however, normally have the ultimate con- 
trol over the sequence in which items are attempted because they have the 
option to skip items that they find too difficult. Many books on "college sur- 
vival" (eg , Kesselman-Turkel & Peterson, 1981, American College Testing 
Program, 1989) contain ?dvice to do just this In any situation in which the 
examinee does not attempt all items on an examination, this strategy means 
that the examinee will actually adjust not only the item sei^uencing, but the 
overall test r^ifficulty as well. Presumably, examinees make decisions about 
item sequencing based partly on their abihty, and partly on current motiva- 
tional and affective states. For example, an examinee who is feeling particu- 
larly anxious may seek a very easy item to gain confidence On the other 
hand, a very calm examinee may enjoy the challenge of a difficult item 

Two kinds of test difficulty can be distinguished. The first is objective, 
or psychometric, difficulty. This is simply the average item difficulty, defined 
in terms of the proportion of examinees passing the item or in terms of an 
item response theory difficulty parameter. The second kind of difficulty, sub- 
jective difficulty, is likely to be more important to the examinee's motivation. 
Subjective difficulty is based on an examinee's perception of the probability 
that he or she has answered the item correctly. 

Computerized adaptive testing (CAT) gives the examinee no control 
over the sequence in which items are attempted. In general, only one item is 
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available al a lime and ihe difHoiUy of that item is selected algorithmicaliy 
based on the examinee's previous responses. In CAT, although the objective 
difficulty of the examination will differ from examinee to examinee, the 
number of items answered correctly, and therefore the subjective difficulty, 
will be relatively constant, depending on the item selection algorithm and the 
item format (i e , nnmbc: of alternatives in a multiple choice item). 

Thus, CAT provides tests that are individually tailored to examinees' 
ability levels, but are, in contrast to traditiorial paper and pencil tests, com- 
pletely insensitive to individual examinees' motivational and affective states. 
In an attempt to allow examinees to make choices about item sequencing 
based on relatively full information, I have explored the potential of a tech- 
nology I call self adapted testing (SAT; RocWin&O'Donnell, 1987)). In SAT, 
examinees take a computer administered test, but instead of attempting items 
selected by algorithm, each examinee specifies the difficulty of the items he or 
she attempts on an item by item basis. 

In making item selections, examinees taking a SAT have access to two 
kinds of inrormation that is not generally available tj examinees taking a pa- 
per and pencil test. First, the examinee receives item by item feedb?ck, so that 
the subjective estimate of difficulty he or she makes is better informed. Sec- 
ond, the examinee is provided with normative information about the objec- 
tive diffictilty of the ite. is from which he or she is to choose. Thus, in SAT, 
in contrast to paper and pendl testing, the examinee has access to the infor- 
mation necessary to make "rational" choices in item sequencing. 

The success of SAT depends on examinees' ability to select item diffi- 
culties in v/ays that optimize their performance. In the initial evaluation c"^ 
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SAT (Rocklin & O'Donnell, 1987), examinees randomly assigned to take a 
SAT performed better (i.e„ had higher ability estimates) than subjects taking 
either a relatively easy test or a relatively difficult test In addition, there was 
no overar loss of precision of measurement associated with SAT, although 
there was an interaction between examinee's test anxiety and type of test. 

Nearly all subjects in that study (86%) progressed from easier items at 
the beginning of the test to more difficult items at the end. Beyond this, 
though, little is known about the strategies examinees used for item selection. 
The purpose of this study was to examine these strategies in detail. In 
.particular, the sludy was designed to answer these questions: (1) Can the item 
selection strategies of examinees be well-modeled using simple rules? (2) 
What are these rules? (3) Are the item selection strategies associated with the 
level or variability of examinees' performance or with examinees' test anxi- 
ety? In addition, EAT provides an environment for evaluating examinees' 
item sequencing preferences that might be relevant to other testing technolo- 
gies. 

Method 

Th 3 study is based upon data collected in a previous study (Rocklin & 
O'Donnell, 1987). Subjects (university students) were recruited through cam- 
pus wide advertisements offering $5 00 for participation in a one hour exper- 
iment and randomly assigned to take a hard, an easy, or a self adapted test 
based on the verbal section (analogies, antonymns, and synonymns in a five 
alternative multiple choice format) of the Scholastic Aptitude Test (College 
Entrance Examination Board, 1980). Only the data from the 29 subjects as- 
signed to the self-adapted test are considered in this study. 
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After completing the Test Anxiety Inventory (Spielberger, 1980), sub- 
jects took the self adapted test. Vorty items were sorted into eight categories of 
difficulty based on Rasch model estimates computed from previously col- 
lected data. Subjects specified the difficulty of the item they wished to at- 
tempt, responded to an item selecteA^om that category, and were informed 
whether or not their response was correct. If no new items were available in 
the category, the subject was directed to choose another category. The test 
ended when 20 items had been answered f r 10 minutes had elapsed, 
whichever came first 

Results 

Three simple models of item selection strategies were evaluated. !n 
each, item selection was guided by whether the previous items was answered 
correctly. No attempt was made to model selecHon cf the first item. In the 
"failure tolerant" model, the examinee was assumed to choose an item of the 
same difficulty as the previous item following an incorrect response and an 
item of the next higher difficulty following a correct response. In the "failure 
intolerant" model, the examinee was assumed to choose an item of the next 
lower difficulty as the previous item following an incorrect response and an 
item of the same difficulty following a correct response. In the "flexible" 
model, the examinee was assumed to choose an item of the next lower diffi- 
culty as the previous item following an incorrect response and an item of the 
next higher difficulty followmg a correct respom e. Each model was used to 
simulate a response vector for each subject. This vector was the same length 
as the actual response vector and constrained by the availability of only 5 
items in each difficulty category. The goodness of fit of each model was eval- 
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uated by computing the square root of the mean squared difference between 
corresponding elements in the two vectors. 

For each subject, the best, second best, and worst filling model were 
identifit J. For the best fitting model for each subject, goodness of fit ranged 
from .8 to 3.1 with a mean of 1.9 (on a scale of 1 to 8. corresponding to the 6 
difficulty categories available to the subjects) Thus, some subjects behaved 
very much as one of the models predicted, while others were somewhat more 
idiosyncratic. 

The mean goodness of fits and number of subjects best fit by each 
model are shown in Table 1. Most, but not all, subjects were best fit by the 
flexible model Those subjects who were fit by the failure intolerant model 
were worse fit than those fit by othtr models (r[2, 261= 4.25, MSe^ 296). 

The ability cf each subject was estimated from a one parameter model 
using item difficulties estimated from previously collected dataCWrighl, 1977). 
The mean ability eslimales and mean standard errors of those estimates are 
shown in Table 1. The ability estimate means differ significantly (fU, 26] = 
4 96, MSt = 1.16), with subjects best fit by the failure tolerant model receiving 
the lowest mean ability estimates. The mean standard errors do not differ 
significantly from one another. 

Finally, the test anxiety scores (from the TAI), as shown in Table 1, do 
not differ significantly from one another. 

Discussion 

Given the information available to the examinee and the sole goal of 
estimating his or her ability, the most "rational" of the three strategics evalu- 
ate here is the flexible strategy. In fact, examinees adopting ihe flexible slrat- 
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egy are essentially administering themselves a stradaptive test (Weiss, 1973). 
This strategy, in which incorrect answers are followed by attempts at easier 
items and correct answers are followed by attempts at harder items, was 
adoptcti by 18 ih2%) of the subjects in th's study. The other U subjects were 
better fit by a model in which difficulty was adjusted only after an incorrect 
answer (14% or 4 subjects) or only after a correct answer (24% or 7 subjects). 
These 11 subjects must have (a) had goals different from or in addition to the 
goal of estimating ability, (b) attended to information other than item diffi- 
culty (e g , their emotional state), or both (a) and (b). 

The flexible strategy and the failure intolerant strategies were both as- 
sociated with better performance than the failure tolerant strategy. Because 
there are so few examinees who selected itz-ms using the failure tolerant 
strategy, it is difficult to draw firm conclusions about them. They do not 
stand out in terms of test anxiety or any of the other attributes assessed in this 
study. 

It seems plausible that the.^ examinees are "failure avoiding" students 
(Covington & Omelich, 1985). Failure avoiding students (as opposed to suc- 
cess oriented and failure accepting students) h?ve responded to repeated aca- 
demic failure by trying to avoid responsibility for their failures. Thus, in this 
study, when they failed an item, they selected an equally hard item because 
they could then attribute their failure to the difficulty of the item, rather than 
their own low ability. When they answered an item correctly, they selected 
an item of the greater difficulty, to insure that if failure ensued it could be 
attributed to the item difficulty. 
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Although further research will be required to understand why exami- 
nee's make the item sequencing choices that they make, the results of this 
study provide strong evidence that there are indeed indiviaual differences in 
item se^iucncing preferences. Given relatively full information, examinees 
differ in the seqeunce in which they want to attempt test items. FurtJier, 
there appear to be at least two item sequencing strategies (the flexible and the 
failure intolerant) that are associated with equally good performance. Exami- 
nees who chose the failure intolerant strategy in this study would presumably 
tind a typical CAT, which more closely resembles the flexible strategy, inhos- 
pitable. These examinees appreciate the chance to savor success The present 
study does nol establish a causal connection bel,veen item sequencing strategy 
and performance, but it seems likely that examinees who prefer the failure 
intolerant strategy would do more pooriy in a CAT than a SAT. 

The study reported in this paper provides general support for the value 
of continued development of SAT. In particular, the fact that not all exami- 
nees taking a SAT choose the same sort of item sequence, combined with the 
lack of evidence for superior performance being associated with any particular 
sequence, implies that examinees can take advantage of the self-tailoring af- 
forded by SAT. It is also clear from questionnaire data (Rocklin & O'DonnplI, 
1987) that they appreciate the opportunity to make item sequencing choices. 

What, finally, does the study reported in this paper tell us about the 
geneial issue of item sequencing? It is unlikely that there is a single item se- 
quence for conventional tests ihaf is optimal ^or all examinees If the goal of 
the test constructor is to improve the performance of all examinees, he or she 
might be best off making the difficulties of items explicit examinees (e g., by 
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grouping items into hard, medium and easy sections on the test form) and al- 
lowing examinees to make their own sequencing decisions. 
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Table 1 

Characteristics of Subjects Best Fit by Each Model 



Failure Intolerant Failure Tclerant Flexible 

Goodness of fit 

2.60 1.62 186 

SD .61 .50 .55 
Estimated Ability 

Mean i.oo .58 gg 

.75 1 03 1 14 

Std. Error of Ability Estimate 

.58 .60 .56 

-M .08 .05 

Test Anxiety 

Mean 39.25 3986 3533 

15.11 8.03 8 06 

4 7 ,8 



Item Selection 
Page 11 



References 

American CotJcge Testing Program (1989). Building bftter study skdls. Iowa 
Cily,IA: ACT. 

College I-ntr.incc nxamin.ition Bonrd (1980). An SAT: Tent and technical data 

for the Scholastic Aptitude Test administered in March 1980. Princeton: 

Educational Testing Service. 
Covington, M. V., U Omelich, C. L. (1985). Ability and effort valuation among 

failure-avoiding and failure-accepting students. Journal of Educational 
^Psychology, 77, 446-459. 
Kaplan, R. M., & Sacuzzo, D. P. (1982). Psychological testing: Principles, 

applications, and issues. Monterey, CA: Brooks Cole. 
Kesselman-Turkel, J., & Peterson, F. (1981). Test-taking strategies. Chicago: 

Contemporary books. 
Laffitte, R G. Jr (1984). Effects of item order on achievement test scores and 

students* perception of test difficulty. Teaching of Psychology, 72, 212- 

213. 

Rocklin, T., & O'Donnell, A. M. (1987). Self-adapted testing: A performance- 
improving variant of computerized adaptive testing, journal of Educa- 
tional Psychology, 79, 315-319. 

Rocklin, T., & Thompson, J. M. (1985). Interactive effects of test anxiety, lest 
difficulty, and feedback. Journal of Educational Psychology, 77, 368-372. 

Spielberger, C D. (1980). Preliminary professional manual for the Test Anxi- 
ety Inventory. Palo Alto, CA: Consulting Psychologists Press. 



Item Selection 

Page 12 



Weiss, D J , (1973) The stratified adaptive computerized ability test (Research 
Report 73-3). University of Minnesota, Department of Psychology, Psy- 
chometric Methods Program. 

Wright, D. D. (1977). Solving measurement problems withe Rasch model 
Journal of Educational Measurement, 14, 97-116 



^5 



